AITopics

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Neural Information Processing SystemsFeb-15-2026, 17:43:45 GMT

Appendix to Weakly Coupled Deep Q-Networks A Proofs

We prove part the first part of the proposition (weak duality) by induction. It is well-known that, by the value iteration algorithm's convergence, Q Consider a state s S and a feasible action a A (s). We use an induction proof. B (w), which follows by the convergence of value iteration.A.2 Proof of Theorem 1 Proof. Now we state the following lemma.

artificial intelligence, machine learning, subproblem, (15 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Neural Information Processing SystemsDec-26-2025, 07:21:09 GMT

Weakly Coupled Deep Q-Networks

We propose weakly coupled deep Q-networks (WCDQN), a novel deep reinforcement learning algorithm that enhances performance in a class of structured problems called weakly coupled Markov decision processes (WCMDP). WCMDPs consist of multiple independent subproblems connected by an action space constraint, which is a structural property that frequently emerges in practice. Despite this appealing structure, WCMDPs quickly become intractable as the number of subproblems grows. WCDQN employs a single network to train multiple DQN ``subagents,'' one for each subproblem, and then combine their solutions to establish an upper bound on the optimal action value.

deep q-network, name change, subproblem, (3 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Neural Information Processing SystemsDec-24-2025, 08:36:32 GMT

On the Convergence and Sample Complexity Analysis of Deep Q-Networks with \epsilon -Greedy Exploration

This paper provides a theoretical understanding of deep Q-Network (DQN) with the $\varepsilon$-greedy exploration in deep reinforcement learning.Despite the tremendous empirical achievement of the DQN, its theoretical characterization remains underexplored.First, the exploration strategy is either impractical or ignored in the existing analysis. Second, in contrast to conventional Q-learning algorithms, the DQN employs the target network and experience replay to acquire an unbiased estimation of the mean-square Bellman error (MSBE) utilized in training the Q-network. However,the existing theoretical analysis of DQNs lacks convergence analysis or bypasses the technical challenges by deploying a significantly overparameterized neural network, which is not computationally efficient. This paper provides the first theoretical convergence and sample complexity analysis of the practical setting of DQNs with $\epsilon$-greedy policy. We prove an iterative procedure with decaying $\epsilon$ converges to the optimal Q-value function geometrically. Moreover, a higher level of $\epsilon$ values enlarges the region of convergence but slows down the convergence, while the opposite holds for a lower level of $\epsilon$ values. Experiments justify our established theoretical insights on DQNs.

convergence and sample complexity analysis, deep q-network, epsilon -greedy exploration, (5 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Neural Information Processing SystemsNov-21-2025, 14:56:05 GMT

Shallow Updates for Deep Reinforcement Learning

deep reinforcement learning, name change, shallow update, (4 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Rahmani, Iman, Yazdannik, Saman, Tayefi, Morteza, Roshanian, Jafar

An Integrated Approach to Neural Architecture Search for Deep Q-Networks

arXiv.org Artificial IntelligenceOct-24-2025

The performance of deep reinforcement learning agents is fundamentally constrained by their neural network architecture, a choice traditionally made through expensive hyperparameter searches and then fixed throughout training. This work investigates whether online, adaptive architecture optimization can escape this constraint and outperform static designs. We introduce NAS-DQN, an agent that integrates a learned neural architecture search controller directly into the DRL training loop, enabling dynamic network reconfiguration based on cumulative performance feedback. We evaluate NAS-DQN against three fixed-architecture baselines and a random search control on a continuous control task, conducting experiments over multiple random seeds. Our results demonstrate that NAS-DQN achieves superior final performance, sample efficiency, and policy stability while incurring negligible computational overhead. Critically, the learned search strategy substantially outperforms both undirected random architecture exploration and poorly-chosen fixed designs, indicating that intelligent, performance-guided search is the key mechanism driving success. These findings establish that architecture adaptation is not merely beneficial but necessary for optimal sample efficiency in online deep reinforcement learning, and suggest that the design of RL agents need not be a static offline choice but can instead be seamlessly integrated as a dynamic component of the learning process itself.

artificial intelligence, machine learning, reinforcement learning, (12 more...)

2510.19872

Country: Asia > Middle East > Iran (0.29)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Shahid, Abrar, Ishum, Ibteeker Mahir, Haque, AKM Tahmidul, Rahman, M Sohel, Islam, A. B. M. Alim Al

Adversarial Reinforcement Learning for Offensive and Defensive Agents in a Simulated Zero-Sum Network Environment

arXiv.org Artificial IntelligenceOct-8-2025

This paper presents a controlled study of adversarial reinforcement learning in network security through a custom OpenAI Gym environment that models brute-force attacks and reactive defenses on multi-port services. The environment captures realistic security trade-offs including background traffic noise, progressive exploitation mechanics, IP-based evasion tactics, honeypot traps, and multi-level rate-limiting defenses. Competing attacker and defender agents are trained using Deep Q-Networks (DQN) within a zero-sum reward framework, where successful exploits yield large terminal rewards while incremental actions incur small costs. Through systematic evaluation across multiple configurations (varying trap detection probabilities, exploitation difficulty thresholds, and training regimens), the results demonstrate that defender observability and trap effectiveness create substantial barriers to successful attacks. The experiments reveal that reward shaping and careful training scheduling are critical for learning stability in this adversarial setting. The defender consistently maintains strategic advantage across 50,000+ training episodes, with performance gains amplifying when exposed to complex defensive strategies including adaptive IP blocking and port-specific controls. Complete implementation details, reproducible hyperparameter configurations, and architectural guidelines are provided to support future research in adversarial RL for cybersecurity. The zero-sum formulation and realistic operational constraints make this environment suitable for studying autonomous defense systems, attacker-defender co-evolution, and transfer learning to real-world network security scenarios.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

2510.05157

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military > Cyberwarfare (0.35)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Artificial IntelligenceJun-24-2025

Predicting E-commerce Purchase Behavior using a DQN-Inspired Deep Learning Model for enhanced adaptability

Jain, Aditi Madhusudan

--This paper presents a novel approach to predicting buying intent and product demand in e-commerce settings, leveraging a Deep Q-Network (DQN) inspired architecture. In the rapidly evolving landscape of online retail, accurate prediction of user behavior is crucial for optimizing inventory management, personalizing user experiences, and maximizing sales. We evaluate our model on a large-scale e-commerce dataset comprising over 885,000 user sessions, each characterized by 1,114 features. Our approach demonstrates robust performance in handling the inherent class imbalance typical in e-commerce data, where purchase events are significantly less frequent than non-purchase events. Through comprehensive experimentation with various classification thresholds, we show that our model achieves a balance between precision and recall, with an overall accuracy of 88% and an AUC-ROC score of 0.88. Comparative analysis reveals that our DQN-inspired model offers advantages over traditional machine learning and standard deep learning approaches, particularly in its ability to capture complex temporal patterns in user behavior . This research contributes to the field of e-commerce analytics by introducing a novel predictive modeling technique that combines the strengths of deep learning and reinforcement learning paradigms. Our findings have significant implications for improving demand forecasting, personalizing user experiences, and optimizing marketing strategies in online retail environments. The e-commerce industry has experienced unprecedented growth in recent years, with global sales projected to reach $6.3 trillion by 2024 [1].

artificial intelligence, machine learning, threshold, (17 more...)

doi: 10.17762/ijisae.v13i1s.7419

2506.17543

Genre: Research Report > New Finding (1.00)

Industry:

Retail (1.00)
Information Technology > Services > e-Commerce Services (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Tidwell, John Christopher, Tidwell, John Storm

Deep Q-Network (DQN) multi-agent reinforcement learning (MARL) for Stock Trading

arXiv.org Artificial IntelligenceMay-8-2025

This project addresses the challenge of automated stock trading, where traditional methods and direct reinforcement learning (RL) struggle with market noise, complexity, and generalization. Our proposed solution is an integrated deep learning framework combining a Convolu-tional Neural Network (CNN) to identify patterns in technical indicators formatted as images, a Long Short-T erm Memory (LSTM) network to capture temporal dependencies across both price history and technical indicators, and a Deep Q-Network (DQN) agent which learns the optimal trading policy (buy, sell, hold) based on the features extracted by the CNN and LSTM. The CNN and LSTM act as sophisticated feature extractors, feeding processed information to the DQN, which learns the optimal trading policy (buy, sell, hold) through RL. W e trained and evaluated this model on historical daily stock data, using distinct periods for training, testing, and validation. Performance was assessed by comparing the agent's returns and risk on out-of-sample test data against baseline strategies, including passive buy-and-hold approaches. This analysis, along with insights gained from explainability techniques into the agent's decision-making process, aimed to demonstrate the effectiveness of combining specialized deep learning architectures, document challenges encountered, and potentially uncover learned market insights.

artificial intelligence, deep learning, machine learning, (19 more...)

2505.03949

Genre: Research Report (0.64)

Industry:

Health & Medicine (1.00)
Banking & Finance > Trading (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Machine LearningMay-6-2025

Universal Approximation Theorem of Deep Q-Networks

Qi, Qian

We establish a continuous-time framework for analyzing Deep Q-Networks (DQNs) via stochastic control and Forward-Backward Stochastic Differential Equations (FBSDEs). Considering a continuous-time Markov Decision Process (MDP) driven by a square-integrable martingale, we analyze DQN approximation properties. We show that DQNs can approximate the optimal Q-function on compact sets with arbitrary accuracy and high probability, leveraging residual network approximation theorems and large deviation bounds for the state-action process. We then analyze the convergence of a general Q-learning algorithm for training DQNs in this setting, adapting stochastic approximation theorems. Our analysis emphasizes the interplay between DQN layer count, time discretization, and the role of viscosity solutions (primarily for the value function $V^*$) in addressing potential non-smoothness of the optimal Q-function. This work bridges deep reinforcement learning and stochastic control, offering insights into DQNs in continuous-time settings, relevant for applications with physical systems or high-frequency data.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Machine Learning

2505.02288

Country:

North America > Canada (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)